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Detailed protocol 


Required software: 
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. Linux operating system 
. STAR >=2.5.2b 

. Drop-seq tools v1.13 

. Trimmomatic >=3.6.0 

. FastQC >= 0.11.5 

. Python >=3.6.0 

. pysam >= 0.15.0 

. Bpipe >= 0.9.10 

. Rsoftware >=3.6.0 


Data preparation and estimation of gene expression 


ti: 
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Clone our analysis repository: 


git clone https://github.com/sirusb/2CLike_analysis.git 


cd 2CLike_analysis/Mapping/BulkRNASeq/ 


// Directory that contains the RNA-Seq data 
RootDir="/n/groups/zhanglab/nadhir/Dux_Bulk/fastq" 


// Define you samples here 


// Format: 


// SampleName : ["fastq_location"™] 


def branches = [ 
//D® 


DOWT_Rep1: ["${RootDir}/D@_WT/wt®_1_combined.fq.gz"], 
DOWT_Rep2: ["${RootDir}/D@_WT/wt@_2_combined.fq.gz"], 


//D1 Pos 
D1_2CPos_Rep1 


//D1 Neg 


D1_2CNeg_Rep1: 
D1_2CNeg_Rep2: 


//D2 Pos 


D2_2CPos_Rep1: 
D2_2CPos_Rep2: 


//D2 Neg 


D2_2CNeg_Rep1: 
D2_2CNeg_Rep2: 


am ag 
D1_2CPos_Rep2: 


tan] 


a4 


ae 


ao 


"${RootDir}/D1_2C_Pos/2c1_combined.fq.gz"], 
"${RootDir}/D1_2C_Pos/2c2_combined.fq.gz"], 


"${RootDir}/D1_2C_Neg/wt1_combined.fq.gz"], 
“${RootDir}/D1_2C_Neg/wt2_combined.fq.gz"], 


“${RootDir}/D2_2C_Pos/2c2c1_combined.fq.gz" 
"${RootDir}/D2_2C_Pos/2c2c2_combined.fq.gz" 


“${RootDir}/D2_2C_Neg/2cwt1_combined.fq.gz" 
“${RootDir}/D2_2C_Neg/2cwt2_combined.fq.gz" 


. Prepare the genomic annotation following the protocol described here. 
. Prepare the meta-genome annotation following the protocol described here. 
. Dowload the Bulk RNA-seq fastq files from GSE133232. 
. Make sure that the pair-end fastq files of each sample are in a separate directory. In order for the script to work smoothly the directories should be named 
as shown bellow. Edit the variable "RootDir" to point to the path of the main folder that contains the fastq samples. 


1, 
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©. I you made any cnanges, make Sure to edit the “brancnes” variable In Our SCript: MappING/BUIKKINASeQ/KINASeg_WorKlowgroovy 
7. Check that the sample paths are correct by running: 
bpipe test RNASeq_workflow. groovy 
8. Edit the different software paths and the location to the previously generated .gtf and STAR genomic indexes in the 
file Mapping/BulkRNASeq/RNASeq_workflowgroovy 
// The mm1@ genome 
GENOME="/home/nadhir/ genomes /mm10/STAR_SynDux_tdTomato_INDEX" 
genome_refF lat="/home/nadhir/genomes/mm10/STAR_SynDux_tdTomato_INDEX/GRCm38.85_SynDux_TdTomato. refF lat” 
gtf="/home/nadhir/genomes/mm10/STAR_SynDux_tdTomato_INDEX/Mus_musculus .GRCm38.85_fixed_SynDux_TdTomato.gtf" 


// Psuedo-genome index 
REPEATS_GENOME="/n/groups/zhanglab/nadhir/Pseudogenome_STARINDEX/" 
REPEATS_GTF="/n/groups/zhanglab/nadhir/Pseudogenome_STARINDEX/repeats.gtf” 


// Location of the Drop-seq tools 


DropSeqTools="/home/nadhir/tools/Drop-seq_tools-1.13" 


// Location of Trimmomatic 

Trimmomatic_root="/n/app/trimmomatic/®@.36/bin" 

Trimmomatic_LOC="$Trimmomatic_root/trimmomatic-0.36. jar” 

9. Inthe Trimmomatic analysis block, make sure to put the correct path the Trimommatics fa files that contain the adapter sequences. 
Trimmomatic ={ 

doc title: "Lunching Trimmomatic", 
desc: “Using Trimmomatic to remove leading and tailing low quality 
author: "“djek.nad@gmail.com" 


var SOUT : "./Trimmomatic" 

var threds: 10 

var TYPE: "PE" 

var adapterPE: "/home/nadhir/tools/trimmomatic/all-PE.fa" 
var adapterSE: "/home/nadhir/tools/trimmomatic/all-SE.fa" 
var MINLEN: 30 

def basename = new File(input1).getName(). prefix 

def extension= (input1 =~ /[*\.]*$/)[@] 


10. Youcan then run the pipeline using bpipe as follows: 
bpipe run -r RNASeq_workflow. groovy 

11. If you had any error during any intermediate stage (eg: software path not correct), you can fix it as run "bpipe retry" so the pipeline will continue for the 
most recent successful stage. 


Differential gene expression analysis: 


1. The analysis script used for bulk RNA-seq differential gene expression analysis can be found in our gihub custom script 
at: 2CLike_analysis/Analysis/bulkRNA_analysis.Rmd 

2. Itis advised to open Rstudio and run the script step-by-step and edit the different paths. 

3. Before running the script, make sure that you have the following packages installed: 
BiocManager: :install(c("edgeR", "ggplot2","gridExtra", “eulerr", "VennDiagram", "rtracklayer", "RColorBrewer","ggsci","ggrepel") 

4. Make sure to put the correct path to the folder containing the gene expression results generated by the previous script in the variable rootbir. 
rootDir <- "D:/Projects/Dux_project/Data/BulkRNASeq”" 

5. Youcan then run the script step-by-step in Rstudio. 
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